Skip to content

Apply LERC valid-mask in GPU decode path (depends on #1529)#1535

Open
brendancol wants to merge 2 commits intoxarray-contrib:mainfrom
brendancol:fix/lerc-valid-mask-gpu
Open

Apply LERC valid-mask in GPU decode path (depends on #1529)#1535
brendancol wants to merge 2 commits intoxarray-contrib:mainfrom
brendancol:fix/lerc-valid-mask-gpu

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #1529. The CPU LERC reader applies the LERC valid-mask
and writes nodata into masked positions, but the GPU LERC tile-decode
path in _gpu_decode.py was still dropping the mask. A masked pixel
read back as 0 on GPU and as NaN or the sentinel on CPU.

This PR fixes the GPU side:

  • The GPU LERC branch calls lerc_decompress_with_mask per tile and
    keeps any returned mask.
  • After tile assembly, an invalid-mask sized to the output image is
    built on host, copied to the GPU once, and used to write the
    resolved fill value into masked positions.
  • gpu_decode_tiles and gpu_decode_tiles_from_file get a
    masked_fill= kwarg that read_geotiff_gpu populates via
    _resolve_masked_fill(ifd.nodata_str, file_dtype) when compression
    is LERC.

Depends on #1529 (uses lerc_decompress_with_mask and
_resolve_masked_fill introduced there). Land #1529 first; this PR
should rebase cleanly off main once it is merged.

Test plan

  • pytest xrspatial/geotiff/tests/test_lerc_valid_mask_gpu.py passes (4 tests).
  • pytest xrspatial/geotiff/tests/test_lerc.py test_lerc_max_z_error.py test_gpu_byteswap_1508.py test_lerc_valid_mask.py passes (38 tests).
  • Manual repro: float32 LERC TIFF with a masked pixel read via read_geotiff_gpu returns NaN at the masked position (matches CPU); without the fix it returned 0.0.

brendancol added 2 commits May 8, 2026 13:45
lerc.decode returns (rc, data, valid_mask, ...) but the wrapper only
forwarded data.tobytes(). GDAL writes LERC TIFFs whose masked pixels
are zero-filled in the data array, so downstream code that masks by
nodata silently sees those zeros as real measurements.

What changed:

_compression.py: new lerc_decompress_with_mask(data) ->
(bytes, valid_mask_or_None). lerc_decompress now calls it and drops
the mask, preserving its existing signature. An all-True mask
collapses to None so callers skip the fill pass.

_reader.py: _decode_strip_or_tile takes a separate path for LERC,
calls the new wrapper, and writes nodata into masked positions after
reshape. A small _resolve_masked_fill helper reads ifd.nodata_str and
falls back to NaN for float dtypes or 0 for integer dtypes when no
GDAL_NODATA is set. Each call site (strip reader, tile reader,
COG-over-HTTP reader) passes a precomputed masked_fill.

tests/test_lerc_valid_mask.py: 8 new tests. Wrapper tests cover the
no-mask path, the all-True-mask collapse, and a partial mask. TIFF
round-trip tests cover float32 with NaN nodata, float32 with -9999,
uint16 with 65535, and a regression that an unmasked LERC file still
round-trips bit-exact. The round-trip tests monkeypatch lerc_compress
to inject a per-tile mask through a predicate, since the writer
hard-codes hasMask=False.

Option A was chosen (plumb nodata into the decode path). Option B
would have required changing decompress()'s return shape, which has
callers in tests and in _gpu_decode.py.

Out of scope: the GPU LERC path in _gpu_decode.py still drops the
mask. The constraints excluded changes there.
)

The CPU LERC reader from xarray-contrib#1529 honours the LERC valid-mask and writes
the file's nodata sentinel into masked pixels. The GPU LERC tile-decode
path was still dropping the mask, so masked pixels read back as 0 on
GPU but as NaN or the sentinel on CPU. Same bug, GPU side.

Changes:

_gpu_decode.py: the LERC branch now calls lerc_decompress_with_mask
per tile and keeps any returned valid-mask. After predictor decode and
tile assembly, _apply_lerc_mask_fill builds an invalid mask on host
(matching the GPU assembly kernel's tile-grid layout), copies it to
GPU once, and overwrites masked positions with the resolved fill
value. Tiles LERC reports as fully valid skip the host work, so the
no-mask path stays zero-copy.

gpu_decode_tiles and gpu_decode_tiles_from_file get a masked_fill
keyword that is forwarded through. read_geotiff_gpu computes it via
_resolve_masked_fill(ifd.nodata_str, file_dtype) for LERC sources.

tests/test_lerc_valid_mask_gpu.py: 4 tests covering float32+NaN,
float32+sentinel, uint16+sentinel, and the no-mask regression. Each
compares read_geotiff_gpu output to read_to_array output for the same
file. Skipped unless cupy + CUDA + lerc are available.

Out of scope: the encode side. The xrspatial writer still hard-codes
hasMask=False; the tests reuse the lerc_compress monkeypatch fixture
from the CPU PR to inject a valid-mask through lerc.encode directly.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 8, 2026
@brendancol
Copy link
Copy Markdown
Contributor Author

@copilot review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant